语言的自动处理在我们的生活中普遍存在,经常在我们的决策中扮演核心角色,例如为我们的消息和邮件选择措辞,翻译我们的读物,甚至与我们进行完整的对话。单词嵌入是现代自然语言处理系统的关键组成部分。它们提供了一种词的表示,从而提高了许多应用程序的性能,从而是含义的表现。单词嵌入似乎可以捕捉到原始文本中单词的含义的外观,但与此同时,它们还提炼了刻板印象和社会偏见,后来传达给最终应用。这样的偏见可能是歧视性的。检测和减轻这些偏见,以防止自动化过程的歧视行为非常重要,因为它们的规模可能比人类更有害。目前,有许多工具和技术可以检测和减轻单词嵌入中的偏见,但是它们为没有技术技能的人的参与带来了许多障碍。碰巧的是,大多数偏见专家,无论是社会科学家还是对偏见有害,没有这样的技能的环境,并且由于技术障碍而无法参与偏见检测过程。我们研究了现有工具中的障碍,并与不同种类的用户探索了它们的可能性和局限性。通过此探索,我们建议开发一种专门旨在降低技术障碍的工具,并提供探索能力,以满足愿意审核这些技术的专家,科学家和一般人的要求。
translated by 谷歌翻译
In both terrestrial and marine ecology, physical tagging is a frequently used method to study population dynamics and behavior. However, such tagging techniques are increasingly being replaced by individual re-identification using image analysis. This paper introduces a contrastive learning-based model for identifying individuals. The model uses the first parts of the Inception v3 network, supported by a projection head, and we use contrastive learning to find similar or dissimilar image pairs from a collection of uniform photographs. We apply this technique for corkwing wrasse, Symphodus melops, an ecologically and commercially important fish species. Photos are taken during repeated catches of the same individuals from a wild population, where the intervals between individual sightings might range from a few days to several years. Our model achieves a one-shot accuracy of 0.35, a 5-shot accuracy of 0.56, and a 100-shot accuracy of 0.88, on our dataset.
translated by 谷歌翻译
Data-driven models such as neural networks are being applied more and more to safety-critical applications, such as the modeling and control of cyber-physical systems. Despite the flexibility of the approach, there are still concerns about the safety of these models in this context, as well as the need for large amounts of potentially expensive data. In particular, when long-term predictions are needed or frequent measurements are not available, the open-loop stability of the model becomes important. However, it is difficult to make such guarantees for complex black-box models such as neural networks, and prior work has shown that model stability is indeed an issue. In this work, we consider an aluminum extraction process where measurements of the internal state of the reactor are time-consuming and expensive. We model the process using neural networks and investigate the role of including skip connections in the network architecture as well as using l1 regularization to induce sparse connection weights. We demonstrate that these measures can greatly improve both the accuracy and the stability of the models for datasets of varying sizes.
translated by 谷歌翻译
Manually analyzing spermatozoa is a tremendous task for biologists due to the many fast-moving spermatozoa, causing inconsistencies in the quality of the assessments. Therefore, computer-assisted sperm analysis (CASA) has become a popular solution. Despite this, more data is needed to train supervised machine learning approaches in order to improve accuracy and reliability. In this regard, we provide a dataset called VISEM-Tracking with 20 video recordings of 30s of spermatozoa with manually annotated bounding-box coordinates and a set of sperm characteristics analyzed by experts in the domain. VISEM-Tracking is an extension of the previously published VISEM dataset. In addition to the annotated data, we provide unlabeled video clips for easy-to-use access and analysis of the data. As part of this paper, we present baseline sperm detection performances using the YOLOv5 deep learning model trained on the VISEM-Tracking dataset. As a result, the dataset can be used to train complex deep-learning models to analyze spermatozoa. The dataset is publicly available at https://zenodo.org/record/7293726.
translated by 谷歌翻译
Head and neck cancers are the fifth most common cancer worldwide, and recently, analysis of Positron Emission Tomography (PET) and Computed Tomography (CT) images has been proposed to identify patients with a prognosis. Even though the results look promising, more research is needed to further validate and improve the results. This paper presents the work done by team MLC for the 2022 version of the HECKTOR grand challenge held at MICCAI 2022. For Task 1, the automatic segmentation task, our approach was, in contrast to earlier solutions using 3D segmentation, to keep it as simple as possible using a 2D model, analyzing every slice as a standalone image. In addition, we were interested in understanding how different modalities influence the results. We proposed two approaches; one using only the CT scans to make predictions and another using a combination of the CT and PET scans. For Task 2, the prediction of recurrence-free survival, we first proposed two approaches, one where we only use patient data and one where we combined the patient data with segmentations from the image model. For the prediction of the first two approaches, we used Random Forest. In our third approach, we combined patient data and image data using XGBoost. Low kidney function might worsen cancer prognosis. In this approach, we therefore estimated the kidney function of the patients and included it as a feature. Overall, we conclude that our simple methods were not able to compete with the highest-ranking submissions, but we still obtained reasonably good scores. We also got interesting insights into how the combination of different modalities can influence the segmentation and predictions.
translated by 谷歌翻译
人工神经网络今天具有广泛的应用程序,因为它们的高度灵活性和从数据中建模非线性功能的能力。但是,由于其黑盒性质,从小型数据集概括的能力差以及在培训期间的不一致的融合,神经网络的可信度受到限制。铝电解是一个复杂的非线性过程,具有许多相互关联的子处理。人工神经网络可能非常适合对铝电解过程进行建模,但是此过程的安全性最关键的性质需要值得信赖的模型。在这项工作中,稀疏的神经网络经过训练,以建模铝电解模拟器的系统动力学。与相应的密集神经网络相比,稀疏模型结构的模型复杂性显着降低。我们认为这使模型更容易解释。此外,实证研究表明,稀疏模型比密集的神经网络从小型训练集中概括得更好。此外,训练具有不同参数初始化的稀疏神经网络的合奏表明,模型会收敛到具有相似学习的输入特征的相似模型结构。
translated by 谷歌翻译
在这项工作中,我们认为寻找人工通用智能(AGI)应该从比人类水平的智能低得多的水平开始。自然界中智能行为的环境是由于有机体与周围环境相互作用的情况,这种环境可能会随着时间的流逝而改变,并对有机体施加压力,以便学习新的行为或环境模型。我们的假设是,学习是通过解释代理在环境中作用时的感觉反馈而发生的。为此,需要一个身体和反应性环境。我们评估了一种进化生物学启发的人工神经网络的方法,该神经网络从名为“人工通用智能的神经进化”(Nagi)的环境反应中学习,这是一个低水平AGI的框架。该方法允许使用自适应突触的随机启用尖峰神经网络的进化络合,该神经网络控制在可变环境中实例化的代理。这种配置使我们能够基准基准控制器的适应性和通用性。可变环境中所选的任务是食品觅食,逻辑门的仿真和卡特杆平衡。这三个任务通过相当小的网络拓扑成功解决,因此,它打开了实验更复杂的任务和方案的可能性,其中课程学习是有益的。
translated by 谷歌翻译
我们考虑使用最新的MultieRlex数据集中考虑法律主题分类中的零射击跨语性转移。由于原始数据集包含并行文档,这对于零拍传输不现实是不现实的,因此我们开发了一个没有并行文档的数据集的新版本。我们使用它来表明,基于翻译的方法非常优于多绘制预训练的模型,这是多曲线的最佳先前的零弹性传输方法。我们还开发了一种双语的教师零摄像转移方法,该方法利用了目标语言的其他未标记文档,并且比直接在标记的目标语言文档上进行微调的模型更好。
translated by 谷歌翻译
在实践中,缺少数据是一个通常发生的问题。已经开发了许多插补方法来填写缺失的条目。但是,并非所有这些都可以扩展到高维数据,尤其是多个插补技术。同时,如今的数据趋于高维。因此,在这项工作中,我们提出了主要成分分析插补(PCAI),这是一个基于主成分分析(PCA)的简单但多才多艺的框架,以加快插补过程并减轻许多可用的插补技术的记忆问题,而无需牺牲插补质量质量在MSE任期。此外,即使某些或全部缺少的功能是分类的,或者缺少功能的数量很大,框架也可以使用。接下来,我们介绍PCA插补 - 分类(PIC),这是PCAI在分类问题中的应用,并进行了一些调整。我们通过对各种情况进行实验来验证我们的方法,这表明PCAI和PIC可以使用各种插入算法(包括最先进的算法),并显着提高插补速度,同时在获得竞争性的均方误差/分类精度相比,指导插补(即直接将其插入丢失的数据)。
translated by 谷歌翻译
在Mediaeval第一次提供视觉情绪分析任务。任务的主要目的是预测对社交媒体共享的自然灾害图像的情绪反应。与灾害相关的图像通常很复杂,并且经常唤起情绪反应,使其成为视觉情绪分析的理想用例。我们认为能够对自然灾害有关的数据进行有意义的分析可能具有很大的社会重要性,这方面的共同努力可以为未来的研究开辟几个有趣的方向。该任务由三个子任务组成,每个任务旨在探索挑战的不同方面。在本文中,我们提供了任务的详细概述,任务的一般动机,以及数据集的概述以及用于评估所提出的解决方案的指标。
translated by 谷歌翻译